On 2020-01-14 I run the analysis of Gene Ontology terms. Here I want to focus on part of those results and make some findings clearer. I will use results only from gene annotation, and not from annotation of transcripts, which is a somewhat noiser dataset. I will not look at the functions affected by hatching condition, since it only affects a couple of genes, to be inspected later. Among all cellular functions significantly enriched with genes differentially expressed among selective regimes, I will focus on those that are most significant, and detailed enough. For example, I skip terms like transmembrane transport, proteolysis or protein phosphorylation, because they are too general. Sections below are named for the functional categories that I think are worth discussing. I will try to contextualize the results. My entry point to the literature is (Garcia-Roger et al. 2019).
I need the results of the differential expression analysis (2020-01-08) to identify the genes annotated with the significant GO terms, and to assess the direction of gene expression change between selective regimes.
library(variancePartition)
library(GO.db)
library(ggplot2)
library(stringr)
ANNOTATION <- '../2019-07-26/genes/annotation.txt'
EXPRESSION <- '../2020-01-08/genes.RData' # variance partition and the mixed models fitted with dream()
The annotation table includes only the most specific terms that can be assigned to a gene. The more general, ancestor terms are assumed, of course, but not explicitly mentioned in the annotation. For the purpose of retrieving all genes annotated to a term, I need to expand the annotation, to make ancestor terms explicit. It took me a while to realize that I can do this with the unlist() function:
annotation <- read.table(ANNOTATION, col.names = c('gene', 'GOterms'), colClasses = c('character', 'character'))
head(annotation)
## gene GOterms
## 1 XLOC_000002 GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009 GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010 GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021 GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036 GO:0005272|GO:0006814|GO:0016020
annotation.list <- strsplit(annotation$GOterms, split='|', fixed=TRUE)
names(annotation.list) <- annotation$gene
# append can only join two vectors at a time. It works with lists!
allAncestors <- append(as.list(GOBPANCESTOR),
append(as.list(GOMFANCESTOR),
as.list(GOCCANCESTOR)))
fullAnnotation <- lapply(annotation.list,
FUN = function(x){
unique(append(x,
unlist(allAncestors[x], use.names=FALSE)))
})
load(EXPRESSION)
Without any attempt to characterize gene function any further, I will make the assumption that the more genes in a pathway are expressed, the more active the pathway is. It is, however, conceivable that some genes are annotated as involved in a function that they regulate negatively.
Dormant eggs are embryos. RNA was extracted either after forced diapause or not, but under hatching conditions. That is, a variable portion of eggs are expected to be leaving the dormancy stage and resuming metabolism, development and hatching. The portion of eggs getting ready to hatch is known to be larger in populations in regular environments.
getGenes <- function(goterm, annotation = fullAnnotation, varpart = varPart) {
genes <- names(annotation[grep(goterm, annotation, fixed=TRUE)])
if (! is.null(varpart)) {
genes <- genes[genes %in% row.names(varpart)]
}
return(genes)
}
volcanoPlot <- function(fit=fitmm, Coef='regime', num=length(fitmm$F), term=goterm, genes=getGenes(term)){
DE <- topTable(fit, coef=Coef, number=num)
DE$annotation <- factor('other', levels=c('other', Term(GOTERM[term])), ordered=TRUE)
DE[row.names(DE) %in% genes, 'annotation'] <- Term(GOTERM[term])
DE <- DE[order(DE$annotation),]
ggplot(data=DE, mapping=aes(x=logFC, y=-log10(P.Value), color=str_wrap(annotation, width=30))) + geom_point() +
labs(color='Annotation')
}
It is among the most significantly enriched terms, with only 13 genes annotated.
goterm <- 'GO:0006289'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
The expression of NER genes is higher in dormant embryos from populations evolved in the unpredictable environment. No idea why.
goterm <- 'GO:0007186'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
Among the 206 genes in this category, 55 have an adjusted p-value lower or equal to 0.1. There are too many genes annotated with this function to plot all their stratified expression levels. I select the most differentially expressed ones.
Among the top 206 genes analysed from the G protein-coupled receptor signaling pathway, 173 have a reduced expression level in the unpredictable environment, and 33 show an increased expression level. Thus, in general we expect dormant eggs from populations under the unpredictable regime to be less sensitive to this particular pathway of signal transduction.
goterm <- 'GO:0007165'
genes <- getGenes(goterm)
volcanoPlot()
544 genes are involved in signal transduction.
It seems that in general, signal transduction activity must be lower in resting eggs from populations under the unpredictable regime. This suggests that in unpredictable environments resting eggs rely less in (unreliable) environmental cues, and probably more in endogenous ones.
goterm <- 'GO:0007160'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
Among the top 16 genes involved in cell-matrix adhesion, 14 have a reduced expression level in the unpredictable environment, and 2 show an increased expression level.
Cell-matrix adhesion is likely related to embryonic development.
Several functions related to ion transport are significant. I wonder to what extent their significance is not dependent on a common subset of genes.
goterm_K <- 'GO:0006813'
goterm_COOH <- 'GO:0046942'
goterm_Sulf <- 'GO:0008272'
goterm_Nucl <- 'GO:1901642'
goterm_TM <- 'GO:0034220'
genes_K <- names(fullAnnotation[grep(goterm_K, fullAnnotation, fixed=TRUE)])
genes_COOH <- names(fullAnnotation[grep(goterm_COOH, fullAnnotation, fixed=TRUE)])
genes_Sulf <- names(fullAnnotation[grep(goterm_Sulf, fullAnnotation, fixed=TRUE)])
genes_Nucl <- names(fullAnnotation[grep(goterm_Nucl, fullAnnotation, fixed=TRUE)])
genes_TM <- names(fullAnnotation[grep(goterm_TM, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Potassium = row.names(varPart) %in% genes_K,
Carboxilic = row.names(varPart) %in% genes_COOH,
Sulfate = row.names(varPart) %in% genes_Sulf,
Nucleoside = row.names(varPart) %in% genes_Nucl,
Transmembrane = row.names(varPart) %in% genes_TM)))
There is hardly any overlap among genes involved in the transport of these substances. The only remarkable overlap happens between potassium ion transport and ion transmembrane transport. But, still the significance of each term should be quite independent of the significance of any other term. This was expected, because the analysis of functional enrichment applied algorithms to account for the dependence structure among GO terms. I analyse them separately, below.
In summary, most genes involved in ion transport, and also those involved in nucleoside transport, are expressed at lower levels in resting eggs from populations submitted to the random regime.
goterm <- goterm_K
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- goterm_TM
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- goterm_Nucl
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- goterm_COOH
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
Trehalose is a disaccharide that confers protection against dessication in several organisms (n.d.). Trehalose is present in higher concentrations in resting eggs of B. plicatilis (which resist dessication) than in asexual eggs (n.d.). While trehalose is a very relevant molecule for dormant embryos, it is unclear why genes involved in its biosynthesis are expressed more in eggs from regular than from unpredictable environments.
goterm <- 'GO:0005992'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
Adult individuals of B. plicatilis have a crown of cilia. Thus, the expression of genes for cilium assembly, and probably also those for cilium movement and cell motility in the embryo must be related to embryonic development. They are, thus a sign that the embryos are leaving the dormant state and resuming metabolism. Under this light, it makes sense that eggs from a regular environment express these genes more than eggs from the unpredictable environment, because most of the former must be getting ready to hatch.
goterm_cellmoti <- 'GO:0048870' # cell motility
goterm_movement <- 'GO:0003341' # cilium movement
goterm_assembly <- 'GO:0060271' # cilium assembly
genes_cellmoti <- names(fullAnnotation[grep(goterm_cellmoti, fullAnnotation, fixed=TRUE)])
genes_movement <- names(fullAnnotation[grep(goterm_movement, fullAnnotation, fixed=TRUE)])
genes_assembly <- names(fullAnnotation[grep(goterm_assembly, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind('Cell motility' = row.names(varPart) %in% genes_cellmoti,
'Cillium movement' = row.names(varPart) %in% genes_movement,
'Cillium assembly' = row.names(varPart) %in% genes_assembly)))
goterm <- goterm_cellmoti
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- goterm_movement
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- goterm_assembly
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0043161'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0006979'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0003774'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm_metallocarboxy <- 'GO:0004181'
goterm_metalloendopep <- 'GO:0004222'
goterm_serinetypeendo <- 'GO:0004252'
goterm_calciumcystein <- 'GO:0004198'
genes_metallocarboxy <- names(fullAnnotation[grep(goterm_metallocarboxy, fullAnnotation, fixed=TRUE)])
genes_metalloendopep <- names(fullAnnotation[grep(goterm_metalloendopep, fullAnnotation, fixed=TRUE)])
genes_serinetypeendo <- names(fullAnnotation[grep(goterm_serinetypeendo, fullAnnotation, fixed=TRUE)])
genes_calciumcystein <- names(fullAnnotation[grep(goterm_calciumcystein, fullAnnotation, fixed=TRUE)])
vennDiagram(vennCounts(cbind(Metallocarboxy. = row.names(varPart) %in% genes_metallocarboxy,
Metalloendo. = row.names(varPart) %in% genes_metalloendopep,
Serine_type = row.names(varPart) %in% genes_serinetypeendo,
Cystein_type = row.names(varPart) %in% genes_calciumcystein)))
goterm <- 'GO:0004181'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0004252'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0004198'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0004222'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
goterm <- 'GO:0016715'
genes <- getGenes(goterm)
plotPercentBars(varPart[genes[order(-varPart[genes, 'regime'])],
c('regime', 'treatment', 'population', 'regime:treatment', 'Residuals')])
volcanoPlot()
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=ca_ES.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=ca_ES.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=ca_ES.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] stringr_1.4.0 GO.db_3.10.0 AnnotationDbi_1.48.0
## [4] IRanges_2.20.2 S4Vectors_0.24.3 variancePartition_1.16.1
## [7] Biobase_2.46.0 BiocGenerics_0.32.0 scales_1.1.0
## [10] foreach_1.4.7 limma_3.42.0 ggplot2_3.2.1
##
## loaded via a namespace (and not attached):
## [1] bit64_0.9-7 splines_3.6.2 gtools_3.8.1
## [4] assertthat_0.2.1 blob_1.2.1 yaml_2.2.0
## [7] progress_1.2.2 pillar_1.4.3 RSQLite_2.2.0
## [10] backports_1.1.5 lattice_0.20-38 glue_1.3.1
## [13] digest_0.6.23 minqa_1.2.4 colorspace_1.4-1
## [16] htmltools_0.4.0 Matrix_1.2-18 plyr_1.8.5
## [19] pkgconfig_2.0.3 purrr_0.3.3 gdata_2.18.0
## [22] BiocParallel_1.20.1 lme4_1.1-21 tibble_2.1.3
## [25] farver_2.0.3 withr_2.1.2 lazyeval_0.2.2
## [28] pbkrtest_0.4-7 magrittr_1.5 crayon_1.3.4
## [31] memoise_1.1.0 evaluate_0.14 doParallel_1.0.15
## [34] nlme_3.1-143 MASS_7.3-51.5 gplots_3.0.1.2
## [37] tools_3.6.2 prettyunits_1.1.0 hms_0.5.3
## [40] lifecycle_0.1.0 munsell_0.5.0 colorRamps_2.3
## [43] compiler_3.6.2 caTools_1.18.0 rlang_0.4.2
## [46] grid_3.6.2 nloptr_1.2.1 iterators_1.0.12
## [49] labeling_0.3 bitops_1.0-6 rmarkdown_2.1
## [52] boot_1.3-24 gtable_0.3.0 codetools_0.2-16
## [55] DBI_1.1.0 reshape2_1.4.3 R6_2.4.1
## [58] knitr_1.27 dplyr_0.8.3 bit_1.1-15.1
## [61] zeallot_0.1.0 KernSmooth_2.23-16 stringi_1.4.5
## [64] Rcpp_1.0.3 vctrs_0.2.1 tidyselect_0.2.5
## [67] xfun_0.12
Garcia-Roger, Eduardo M., Esther Lubzens, Diego Fontaneto, and Manuel Serra. 2019. “Facing Adversity: Dormant Embryos in Rotifers.” The Biological Bulletin 237 (2): 119–44. doi:10.1086/705701.
n.d.
n.d.